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Abstract 

We introduce a new formal framework aimed at checking temporal 
safety properties of object-oriented programs. The underpinning theoreti- 
cal foundations of our framework is an extended class of register automata. 
As registers' values can be updated, our automata are able to express 
highly complex properties of systems involving multiple interacting ob- 
jects. Specifications are then used to instrument Java bytecode so that 
their violations can be automatically detected at runtime. The tradeoff 
between the coverage and the overhead of the monitoring system is tun- 
able by means of a number of parameters. We validated our technique by 
checking several safety properties on large open source projects. 



1 Introduction 

Runtime verification is a technique where the behaviour of programs is moni- 
tored at runtime to check whether executions can violate certain safety proper- 
ties. Systems for runtime verification stand somehow in between classic verifica- 
tion and testing: they work on the actual system (as testing) but check violations 
of temporal properties as, for example, it is done in model checking. Compared 
to traditional formal verification techniques, runtime verification systems check 
only certain program executions, however, the error reports are accurate and 
detected violations represent real bugs in the program. 

In the context of object-oriented programming leading runtime verification 
techniques include JavaMOP [22], QVM [4], Tracematches [2], and techniques 
based on typestate [29, 9, 23, 15, 6]. Although powerful, these methods present 
some limitations. For example, either in terms of language expressivity, or in the 
ability to express concisely properties of the interaction of several objects over 
time (typestate), or in concisely expressing properties requiring an unbounded 
number of different object re-bindings (sometimes called parameters in the run- 
time verification community). 
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Register Automata [21] are automata where a number of registers are used to 
store and compare data. The values of registers can be updated over time, and 
data can be taken from unbounded domains (e.g., the set of object references). 
Thus register automata provide a powerful device for reasoning about temporal 
relations of a (possibly unbounded) number of objects in a finite manner. They 
have been studied extensively from the theoretical point of view and essential 
results for their analysis have been established (e.g., decidability of emptiness, 
closure under intersection, and so forth). 

Thus, it seems sensible to attempt an approach to runtime verification taking 
register automata as the starting point. The aim is to exploit the flexibility and 
the power of registers to address certain properties not easily dealt with in 
other approaches. On the formal side, we started from register automata and 
have extended them driven by typical properties required in real-world object- 
oriented systems. The extension has been careful crafted to be able to define 
the precise mathematical connection with register automata, and yet, make it 
easy for programmers to express properties of their code. This process has 
resulted in the definition of two new classes of automata: a high-level and a low- 
level one. In the high-level automata, temporal properties about programs are 
naturally expressed. Low-level automata, on the other hand, simplify the formal 
correspondence with register automata. Finally we defined a formal language - 
called TOPL (Temporal Object Property Language) - mapping directly on the 
high-level automata. 

We complement the theoretical construction with a practical tool. We have 
implemented TOPL in a runtime parametric verification framework that can 
be used by Java programmers to express rigorous temporal properties about 
their programs. These properties will then be automatically checked by the 
system. The formal correspondence between the properties and register au- 
tomata defined in this paper gives a rigorous mathematical foundation to the 
approach and allows us to reuse many of the results for register automata on 
TOPL automata. For example, given a TOPL property, we can decide whether 
its language is empty. The formal development is completely hidden from the 
programmer. Our system can be tuned in terms of coverage, overhead, and 
trace reporting by means of a number of parameters. 

Contributions. In summary, the contributions of this paper are: 

• We define TOPL, a formal specification language tailored to express prop- 
erties involving object interactions over time in a way that is familiar to 
object-oriented programmers. 

• We define a precise formal semantics for TOPL, making it suitable for 
program analysis, both static and dynamic. 

• We define a precise formal correspondence between TOPL automata and 
register automata. In particular we prove that they are equally expressive. 
As our proof is constructive, this implies that the decidability results for 
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register automata hold also for TOPL automata. Consequently we can 
decide several features of TOPL specifications (e.g., language emptiness). 

• We have implemented our parametric framework in a tool that automat- 
ically check at runtime violations of TOPL properties in Java programs. 
A number of parameters are used to tune the precision of the system. 

• We report on experiments of running our tool on large open-source projects. 
The results are encouraging (e.g., we have found an interesting and previ- 
ously unknown concurrency bug in the DaCapo suite). 

The paper is organized as follows. Section 2 gives some example proper- 
ties motivating TOPL. Section 3 provides background of register automata and 
defines the basis of our formalism. Section 4 introduces TOPL automata and 
proves the equivalence with register automata. Section 5 defines the TOPL prop- 
erty language and its semantics. Section 6 describes several example properties 
using TOPL. Section 7 describes the implementation of the runtime verification 
system. Section 8 discusses the experimental results and Section 9 related work. 
Finally, Section 10 concludes the paper and describes our plans for future work. 

2 Motivating Examples 

Interaction among objects is at the core of the object-oriented paradigm. Con- 
sider for example Java collections. A typical property one would want to state 
is: 

If one iterator modifies its collection, then other existing iterators of 
the same collection become invalid, which means they cannot be used (1) 
further. 

The formalization of the above constraint is non-trivial since it needs to keep 
track of several objects (at least two iterators, and one collection) and their 
interaction over time. 

A slightly more complex scenario is described by classes in Figure 1. Class 
CharArray manipulates an array of chars and implements the Str interface. 
Class Concat is used to concatenate two object of type Str. Also Concat im- 
plements Str. Consider the case where Concat is used to implement a rope 
data structure. 1 Rope's operations like insert, concat, delete may update the 
shape of the tree and references to its root. In this case we may have two (or 
more) collections sharing some elements. Consequently iterators operating on 
these different collections may invalidate each other. We need to modify (1) 
increasing its complexity: 

// one iterator modifies its collection, then other existing iterators of 
collections sharing some elements become invalid, which means they (2) 
cannot be used further. 

1 A rope is a binary tree shaped data structure for efficiently storing and manipulating 
very long strings [11]. 
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interface Str { 

void set(int i, char c) ; char get(int i) ; 
int lenO ; 
Itr itrO ; 

> 

interface Itr { 

boolean hasNextO; char nextO; 

void set (char c) ; 

} 

class CharArray implements Str { 
char [] data; 
// ... 

> 

class Concat implements Str { 
Str one , two ; 

public static Concat make (Str one, Str two) {/*... */ } 
// ... 

> 



Figure 1: A first example: Java code 



Re-binding of specification variables. Now let's consider the case where 
we want to perform taint checking for input coming from a web form. What we 
want to check is the following property: 

Any value introduced by the input () method should not reach the ^ 
sinkO method without first passing through the sanitizer () method. 

This property is at first sight simple. However, it's difficulty can vary depending 
on the context. Consider in fact the case where the input is constructed by con- 
catenating strings from the web form, by for example, using ropes implemented 
with class Concat. The number of user inputs, and therefore of concatenations, 
is not known a priori and in general unbounded. Consequently we will have a 
possibly unbounded number of tainted objects. In a temporal specification, we 
would then need either different logical variables for each different tainted ob- 
jects or the ability to re-bind (or update) variables in the temporal specification 
in a way that, at the semantic level, we capture all the tainted objects. For an 
unbounded number of object, re-binding specification variables with different 
values during the computation helps the specification to be kept finite. 

The need for rebinding of variables in the specification arises also in other 
contexts. For example, when reasoning about the evolving shape of dynamically 
allocated data-structures. Consider the following loop using a list: 

while (l.nextO !=null) { 
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// here using the list 1 in some way. . . 



} 

If the list 1 is circular or pan-handle shaped, one of the problem could be that 
the loop will diverge. Being a violation of a liveness property (termination), 
divergence cannot be observed at runtime in finite time and therefore it is harder 
to debug. If we have obtained the list by calling a third party library, we would 
want to check the following property: 2 

The shape of the list should be neither circular nor pan-handle. (4) 

We will see in Section 6 that also the encoding of properties of this kind requires 
the ability to update the value of the specification variables. This is because the 
way to encode the property is by detecting, when using the list, that we reach 
an element we have seen before. 

3 Overview and Background 

Figure 2 shows the main concepts of concern. Register automata are well- 
studied devices, introduced decades ago. The gap between TOPL properties 
and register automata is filled by two semantic models, namely high-level and 
low-level TOPL automata. The high-level model (rollback TOPL automata) is 
better suited for implementation; the low-level model (simple TOPL automata) 
is easier to analyze. 

The target of an arrow is at least as expressive as its source. All three 
automata models have the same expressivity. In general, decision problems 
on automata (such as membership, emptiness, language inclusion), are either 
decidable for all three models, or undecidable for all three models. However, 
decidable problems do not necessarily have the same complexity for the three 
models. 

TOPL properties — ► rollback TOPL automata 

t 

simple TOPL automata 
t 

register automata 

Figure 2: Main concepts 

We review here register automata and several results about them. Together 
with the reductions presented later, these results imply similar results about 
TOPL automata. 

2 Of course, one can write a method (to call before entering the loop) which traverses the 
list and check whether it is circular or pan-handle. However, when the scope of the list is not 
clearly defined, these kind of explicit checks become cumbersome and error prone. 



We begin by defining a generic automaton, of which both TOPL and register 
automata are instances. A letter £ is an n-tuple (v\, . . . , v n ) of values from a 
possibly infinite set V; we denote the alphabet by £ = V n . A store s is an m- 
tuple (ui, • • • , Um) of values; we denote the set of stores by S — V m . A register 
i is an integer that identifies a component of the store; the set of registers is 
[m] = {1,2,..., m}. A guard g is a formula in some logic interpreted over pairs 
of letters and stores; we write (s, £) \= g to denote that store sgS and the letter 
£ e £ satisfy the guard <?, and we denote the set of guards by G. An action a 
is a small program acting on stores, which has relational semantics; we denote 
the set of actions by A = £ — > (S x S). Thus, an action applied to a letter a(l) 
is a relation on stores. We will later specialize to specific and simple logics (for 
guards) and programming languages (for actions). A label A is a pair (g,a) of a 
guard g and an action a; we denote the set of labels by A = G x A. 

Definition 1. An automaton A with m registers over n-tuples of values from 
the set V consists of 

• a finite set Q of states 

• an initial state qo e Q 

• an initial store s e S 

• a set F C Q of final states 

• a finite transition relation 6 C Q x A x Q 

A configuration a; is a pair (q, s) of a state q and a store s; we denote the 
set of configurations by X = Q x S. The initial configuration is (qo, sq). A 
configuration is final when its state is final. The configuration graph of the 
automaton is a subset of X x X x X. We write 

i 

to mean that (x\,£, X2) is in the configuration graph of A. If the automaton is 
clear from the context, we omit the subscript. 

Definition 2. The configuration graph of an automaton consists of exactly 
those configuration transitions 

(?1, si) A (<72,s 2 ) 

for which there exists an automaton transition (51, (5, a), 92) £ S such that 
(sx,£) h 9 and (s 1 ,s 2 ) e a{£). 

An automaton is nondeterministic when its configuration graph contains two 
distinct transitions that have the same source x\ and are labeled by the same 
letter £: 

x\ — > X2 and x\ — > X3, with X2 7^ ^3 
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Otherwise, the automaton is deterministic. 

The language C{A) of an automaton A is the set of words that label walks 
from the initial configuration to some final configuration: 

C(A) — { l\ . . . Ik | xq initial, -V^v Xi for i £ [k], x^ final } 

Being a walk rather than a path, the configurations Xq, x±, . . . , Xk need not be 
distinct. 

Given a number m of registers, an arity n for letters, and a countable set V 
of values, the set A = S —> (S x S) of actions is uncountable. Therefore we must 
focus on a subset of A if we want our automata to be finitely representable. We 
will only consider actions that are sequences of assignments. The assignment 
(seti := j) is defined as follows: 

(Oi,..., v m ),{v[,...,v' m )) e (seti := j)((wi,...,M„)) 

if and only if v' k = Vk for all fc, except that v[ = uj. The sequence operator ; is 
defined by saying that the relation (ai; CI2) (£) is the composition of the relations 
ai(£) and for all The identity action nop satisfies nop(^) = { (s, s) s £ 

S } for all £. 

Register automata work with arity n — 1, that is E = V^. In this case we 
write (seti) as a shorthand for (seti :— 1). The syntax of guards for register 
automata is the following: 

Gra »= fresh I eq i 

They are interpreted as follows: 

((ui, . . . , u m ) 1 w) |= fresh iff Ui ^ v for all i 

((ui,...,u m ),v) \=eqi iff Ui = v 

With these notations we can fix a definition for register automata. 

Definition 3. A register automaton with m registers is an automaton over the 
alphabet S = V whose labels are given by A = { (fresh, seti) | i 6 [m] } U 
{ (eqi,nop) | i € [to] }. 

There are several variants of definitions for register automata in the litera- 
ture. Our variant corresponds directly to what Neven et al [24] call 1N-RA (for 
'one-way non-deterministic register automata'). 

Example 1 . The language { 06c | a ^ c and b ^ c } is recognized by the follow- 
ing register automaton with 3 registers over the alphabet E = V U {-L}: 

• Q = {1, 2, 3, 4} and q = 1 and s = (_L, _L, _L) and F = {4} 

• the set of transitions is 

S = {(l, (fresh, set 1), 2), (2, (fresh, set 2), 3), 
(2, (eq 1, nop), 3) , (3, (fresh, set 1), 4)}. 
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Observe that: (1) The two parallel transitions from 2 to 3 are needed because 
labels cannot have the shape (true, set i); in fact, we have not even defined yet 
the guard true. (2) Only values from so may appear multiple times in a store; 
in this case, only _!_. (3) Register 3 ensures that _!_ will not appear in the third 
position of words, which is a risk because we extended the alphabet. (4) The 
action of the last transition is superfluous, but required by the definition. | 

Theorem 1. The following properties of register automata are known: 

1. The emptiness problem is coNP- complete [27]. 

2. The membership problem is NL-complete [24] and NP-complete [27]. 

3. The inclusion, equivalence, and universality problems are decidable for 
2 registers [21, 24]. 

4- The inclusion, equivalence, and universality problems are undecidable in 
general [24]. 

5. Register automata are closed under union, intersection, concatenation, 
and Kleene star [21]. 

6. Register automata are not closed under complementation and reversal [21]. 

7. Nondeterministic register automata are strictly more expressive than de- 
terministic register automata [21]. 

8. The language C(A) C* is regular for any register automaton A and any 
finite subset C <ZV of values [21]. 

Some of these properties contrast with analogues for finite alphabets, which 
are better known. For example with finite alphabets automata are closed under 
complementation and reversal, and nondeterminism does not increase expressiv- 
ity. These results differ for infinite alphabets because registers can be re-bound 
to different values an unbounded number of times. On the other hand, once 
the alphabet is 'sliced' to a finite subset, the resulting language is regular in the 
standard sense. 

These results have analogues for TOPL automata. The details depend on 
what exactly the arrows from Figure 2 are. The gist will generally be a trans- 
formation of the following kind: From an automaton A in one model, construct 
an automaton A' in another model such that 

l x . . . 4 G C(A) ^ f(h) . . . /(4) G C(A') 

for some fixed / that is easily computable. The interesting ideas will be in how 
A' is built; / handles technicalities. 

4 TOPL Automata 

We now gradually move towards a model that is better suited for runtime veri- 
fication of Java programs. 



8 



4.1 Simple TOPL Automata 

TOPL automata work with the alphabet £ = V n for some n £ N, unlike register 
automata, which force n = 1. It is then natural to consider different guards and 
actions, which handle specific components of the letters. TOPL guards are the 
following: 

Gtopl ::= eq i = j | neq i ^ j true | Gtopl and Gtopl 

If n = 1, we write (eq i) instead of (eq i = 1), and (neq i) instead of (neq i ^ 1). 
Recall that (seti) abbreviates (seti := 1). The guards are interpreted as follows: 

((ui,...,u m ), (vi,...,v n )) \= eqz = j iff m = v } 

((ui,...,u m ),(wi,...,u„)) |= neqi ^ j iff m ^ v } 

and 

(s, l)|= true always 

(s, £) \= g x and g 2 iff (s, £) |= 51 and (s, |= g 2 

We can now define TOPL automata. 

Definition 4. A TOPL automaton is an automaton whose guards are from 
the set Gtopl (defined above) and whose actions are sequences of assignments 
(defined in Section 3). 

Example 2. Consider again the language {abc \ a ^ c and 6^ c}. It is rec- 
ognized by the following TOPL automaton with 2 registers over the alphabet 
S = V: 

• Q = {1, 2, 3, 4} and q = 1 and F = {4} 

• S consists of (l, (true, set I), 2) and (2, (true, set 2), 3) and (3, (neq 1 and 
neq 2, nop), 4) . 

Any initial store Sq works. | 

It is not difficult to see that TOPL automata are at least as expressive as 
register automata. 

Lemma 1 (RA to TOPL). There exists an algorithm that, given a register au- 
tomaton A over the alphabet V , with m registers, \Q\ states, and \S\ transitions, 
builds a TOPL automaton A' over the alphabet V , with m registers, \Q\ states, 
and \5\ transitions such that 

w e C{A) w e C{A') 

Proof. The guard fresh is encoded as the guard neq 1 and ... and neq m. All 
other guards and actions remain unchanged. □ 
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It is relatively harder to establish that TOPL automata are at most as 
expressive as register automata. 

Lemma 2 (TOPL to RA). There exists an algorithm that, given a TOPL 
automaton A over the alphabet V n , with m registers, \Q\ states, and \S\ transi- 
tions, builds a register automaton A' over the alphabet V' , with m+1 registers, 
0(nm m+1 \Q\) states, and 0(nm m+2 \5\) transitions such that 

li...t k e C{A) <=> f(h) ■ ■ ■ f(lk) £ C(A') 

where f((v\, . . . , v n )) is V\ . . . v n repeated m + 1 times, and V is V with up to 
m — 1 extra dummy values. 

Proof. We describe an encoding chain A\ — > A2 — > A3 — > A4 that goes from a 
TOPL automaton Ai to a register automaton A4. The encoding A2 — > A3 is 
more difficult than the other two. 

Step Ai — > A 2 - The automaton A2 works over the alphabet V, while Ai may 
work on an alphabet V n with n > 1. The number of registers is preserved. Each 
letter (v\, . . . , v n ) is transformed into the word V\ . . . v n repeated m + 1 times. 
Each transition [qi, {g,a),q 2 ) is encoded with a path from qi to q 2 of length 
(m+ l)n. The first n transitions encode the guard g - they are labeled (gi, nop), 
. . . , (g n , nop). The guard gj contains the conjunct eq i if and only if g contains 
the conjunct eqi = j; similarly for neq. The following mn transitions encode 
the action a - they are labeled (true, ai), . . . , (true, a mn ). If the last assignment 
of a to register i is seti := j, then a( i _ 1 ) m+J is seti. Otherwise, if a does not 
assign at all to register i, then a(j_i) TO+ i is nop. 

The labels of A2 have one of the following shapes: 

• (eq i\ and eq i 2 ■ ■ ■ and neq i[ and neq i' 2 . . . , nop) 

• (true, seti) 

• (true, nop) 

If automaton Ai has \Q\ states and |<5| transitions, then A2 has 0(mn\Q\) states 
and 0(mn\6\) transitions. 

Step A2 — > ^3- Again, the number of registers is preserved. For this step, 
the transformation of letters is the identity. The automaton A3 uses guards 
from Gra, while A2 uses guards from Gtopl- Observe that a register automa- 
ton never stores a value twice, because assignments are guarded by fresh. This 
is the main obstacle. The key idea of the encoding A 2 — > A3 is the following: 
Encode a function r £ [m] — > [m] in the state itself such that r(i) is 
the register of A3 that currently simulates the register i of A2 ■ If two 
registers i ^ j of A2 hold the same value, they are simulated by the 
same register r(i) = r(j) of A3. 

Let us develop the idea in some detail. Let m and Q denote the count of 
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registers and, respectively, the set of states of A2- The set of states of A3 
is Q x ([to] — » [to]). Each state q of A2 is encoded with to™ states (q,r) 
of .A3. The configuration ((<?, r), (tti, . . . ,« m )) of A3 simulates the configura- 
tion (a, . . . , v m )) of A2 with «j = u r (j). Moreover, we ensure that no two 
registers of A3 hold the same value; that is, the stores of .A3 are injective, Ui ^ Uj 
if i ^ j. To ensure that the initial store of A3 is injective we may need to use 
up to to — 1 dummy values. 

Each transition (gi, (5, a), 92) of A2 is encoded into several transitions ((gi,r), (g',a r ), (q2,r')) 
of A3. Here r takes all to™ possible values, and g', a', and r' are chosen depend- 
ing on g, a, and r, as described below. 

Consider a transition (gi,(eqii and eqi 2 and ... and neqz^ and neqz 2 and 
. . . , nop), g 2 ) . Let I = {ii,i 2 , . . .} and 7' = { i\,i' 2 , . . . }. Note that transitions 
labeled by (true, nop) are a special case. We analyze separately the cases 7 = 
and 7^0. 

Case 7 = 0. The guard requires the current letter i to be different from 
the values in the set I' of registers of A 2 . So, £ must be different from the 
values held in the set r(I') of registers of A3. Because the stores of A3 have 
no repeated value, I may be either fresh, or equal to a register outside of the 
set r(I'). Thus, the encoding consists of several parallel transitions from (gi, r) 
to (q 2 ,r). One is labeled (fresh, nop), and several are labeled (eq j, nop) for each 
j e [m] -r(I'). 

Case 7^0. The guard also requires that the current letter equals the value 
in the set 7 of registers of A 2 . This cannot be true if |r(7)| > 1 or r(7)flr(7') ^ 0: 
The encoding in this case is no transition. Otherwise, r(7) = {i} for some i, 
and the encoding is a transition from (qi,r) to (q 2 , r) labeled by (eq i, nop). 

Consider a transition (gi , (true, set i),q 2 ^j. Either the current letter is in some 
register or it is in none. To account for the first case, the encoding contains 
parallel transitions from (qi,r) to {q 2 ,r'j) labeled by (eq j, nop), for all j e [to]. 
Here r'j is the same as r except that Tj(i) = j. To account for the second case, 
the encoding contains a transition from (qi,r) to (q 2 ,r'j) labeled by (fresh, set j), 
for one j chosen as follows. If r is a permutation then j — r(i); otherwise j is 
any element of [to] — r([m]). 

If automaton A2 has \Q\ states and |i5| transitions, then A3 has 0{m m \Q\) states 
and 0(TO m+1 |<5|) transitions. 

Step A3 — > A4. Transitions labeled by (fresh, nop), which are produced by the 
encoding described so far, are not allowed in register automata. They can be 
handled by adding one register, without significantly increasing the number of 
transitions. First, each label (fresh, nop) is transformed into (fresh, setTO+ 1). 
Second, for each transition labeled (fresh, a), we add a parallel transition labeled 
(eq m + 1, a). Again, the transformation of letters is the identity. 

If the automaton A3 has to registers and \8\ transitions, then A4 has to + 
1 registers and 0(|5|) transitions. 

This completes the proof. □ 
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Remark 1 . It follows from Lemma 1 and Lemma 2 that simple TOPL automata 
and register automata are equally expressive. 

4.2 Automata with Rollback and Skip 

TOPL automata are almost convenient for runtime verification of Java. The 
letters will be events like method calls and returns. Calls and returns hap- 
pen, strictly speaking, at different moments. Moreover, some properties might 
explicitly mention other events that must happen between a call and its corre- 
sponding return. But, consider the constraint that iterators should advance only 
if hasNext returned true. The event 'hasNext was called on the iterator that is 
being tracked' and the event 'hasNext returned true' are distinct. Yet, it is con- 
venient to pretend that two such consecutive events make one composed event. 
This motivates rollback, a mechanism by which transitions consume multiple 
consecutive letters. 

Definition 5. A rollback automaton A consists of: 

• a finite set Q of states; an initial state qo G Q; an initial store sq € S; a 
set F C Q of final states 

• a finite transition relation 5 C Q x A* x Q 
where A is a set of labels as defined in Section 3. 

The difference between rollback automata and (simple) automata, according 
to definitions 1 and 5, is that rollback transitions have a list of labels A = 
Ai . . . Xd, while simple transitions have exactly one label A. A transition with 
exactly one label is said to be a unit transition. If a rollback automaton has 
only unit transitions, then it is also a (simple) automaton. 

The semantics of a rollback automaton cannot be described by the configura- 
tion graph from Definition 2, because that definition does not handle non-unit 
transitions. Instead, we define a rollback configuration graph. Perhaps surpris- 
ingly, in the case of simple automata, the rollback configuration graph does not 
coincide with the configuration graph. This discrepancy is explained by the 
different goals of the two models: automata without rollback are meant to be 
easy to analyze, while rollback automata are meant to be convenient for speci- 
fying properties of object-oriented programs. Often the desired behavior for a 
letter is to remain in the same configuration. Without rollback, this behavior 
is achieved by explicit loops in the automaton. The definition of the rollback 
configuration graph reduces the need to explicitly list such skip loops. 

A rollback configuration is a pair (x, w) of a configuration x and a word w; 
we denote the set of rollback configurations by Y — X x £*. We think of w as 
yet to be processed. A rollback configuration is initial when its configuration is 
the initial configuration; that is, it has the shape ((qo,so),W) , where go is the 
initial state of the automaton, and sq is the initial store of the automaton. A 
rollback configuration is final when its state is final and its word is the empty 
word; that is, it has the shape ((g, s), e), where q is one of the final states F of 
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the automaton, and e is the empty word. The rollback configuration graph is a 
subset of Y x S* x Y. We write 

w 

yi vi 

to mean that (yi,w,y 2 ) is in the rollback configuration graph of A. If the 
automaton is clear from the context, we omit the subscript. 

The following concepts simplify the definition of the rollback configuration 
graph. The (simple) automaton 

T(s ;(gi,ai),...,(g d ,a d )) 

is defined to have states 0, 1, . . . , d, out of which is initial and d is final, 
transitions (i — 1, (gi, dj), i) for all i G [d], and initial store so- Recall that 
l\ . . . £k S C(A) when there exist (not necessarily distinct) configurations xq, x\, 
. . . , Xk such that xo is the initial configuration, Xk is a final configuration, and 

Xi-i Xi for i e [k] 

When the store of Xk is Sk we say that A can accept l\ . . .ik with store Sk- A 
word may be accepted with multiple stores. 

Definition 6. The rollback configuration graph of an automaton consists of two 
types of transitions: 

1. rollback transitions 

((gi,si), ww') A ((q 2 ,s 2 ),w') 

when there exists a transition (qi,X, q 2 ) £ 5 such that T(si; A) can accept 
w with store s 2 

2. skip transitions 

(x, tw) (x, w) 
when no rollback transition starts from (x, £w) 

The rollback language C P {A) of an automaton A is the set of words that 
label walks from some initial rollback configuration to some final rollback con- 
figuration: 

C P {A) = { wi . . . Wk I yo initial, y i - 1 ^ y { for i e [k], yt final } 

Since a simple automaton A is also a rollback automaton, is has both a simple 
language C(A) and a rollback language C p (A). The treatment of skip transitions 
implies the aforementioned discrepancy, that the languages £(^4) and C P (A) are 
usually distinct. 
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Example 3. Consider the automaton consisting of one transition labeled with 
the guard eq 1, from the initial state to the final state. The alphabet is E = V 
and the initial store has one register containing value v. The simple language of 
this automaton consists of one word made of one letter, namely v. The rollback 
language, on the other hand, consists of all words that contain the letter v. | 

It is possible, however, to transform TOPL automata such that their rollback 
languages coincide with their simple languages. 

Lemma 3 (TOPL to rTOPL). There exists an algorithm that, given a TOPL 
automaton A with \Q\ states, at most d transitions outgoing of each state, and 
guards with at most k conjuncts, builds a TOPL automaton A' with \Q\ + 1 
states and at most (d + k )\Q\ transitions such that 

w e C(A) <=^ w e Cp(A') 

Proof. For the simple automaton A compare the simple configuration graph 
with the rollback configuration graph. Each simple transition x\ — > x 2 cor- 
responds to several rollback transitions {x\,iw) ^ l (x 2 ,w), for all w. The 
rollback graph, however, also has skip transitions (x,£w) ^ l (x,u>) for configu- 
rations x that have no outgoing rollback transitions. Thus, the simple language 
and the rollback language would agree if all simple configurations would have 
at least one outgoing simple transition. 

We obtain A' from A by adding unit transitions that do not change the 
simple language, but ensure that all simple configurations have an outgoing 
transition. First we add a special stuck state ^stuck- Then, for each original 
state q, we list the guards gi, g 2 , 9d of the outgoing transitions. The 
configuration (q, s) has no outgoing transition for some i when (s, £) \/= gi for i e 
[k]. So, we synthesize a guard g that holds exactly in this situation. Informally 
we want to add a transition from q to g s tuck with the guard g — ->gi A ... A ->gd 
and the action nop. There is one small complication: Gtopl does not have a 
negation operator. However, we can negate the simple guards eq and neq, we 
can use distributivity to put g in disjunctive normal form, and we can simulate 
disjunction by parallel transitions. Thus, if each gi contains up to k simple 
conjuncts, we add at most k d transitions from state q. □ 

The converse transformation is more difficult. There are many obvious ways 
to simulate a rollback TOPL automaton with a simple TOPL automaton, and 
many of them are wrong. Note, for example, that when the set V of values 
is countable, the simple configurations are countable, but the rollback config- 
urations are uncountable. Rollback, which ensures that several letters occur 
atomically, and skip, which filters irrelevant letters, usually lead to intuitive 
behaviors, but not always, as the following example illustrates. 

Example 4. Consider the following rollback TOPL automaton with 2 registers 
over the alphabet £ = V = {A,B}: 

• Q = {1, 2, 3} and q = 1 and s = (A, B) and F = {3} 
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• S consists of (l, [(eq 2, nop), (eq 1, nop), (eq 2, nop)] , 2) and (l, [(eq 1, nop)] , 3) 

The rollback language of this is automaton consists of those words in which the 
first A is not surrounded by two Bs. For example it contains A, BA, BAA, and 
BAAB, but not BAB or BBAB. H 

The key idea for encoding rollback TOPL automata into simple TOPL au- 
tomata is the following: 

It is sufficient to simulate rollback configurations with words shorter 
than the length d of the longest transition, because both rollback and 
skip transitions depend on < d letters. 

If \w\ < d, it should be possible to encode a rollback configuration ((q,s),w) 
using the finite memory of a simple TOPL automaton. To decide which transi- 
tions to take we have \w\ + 1 letters, the ones stored in the current configuration 
plus the current letter, and to registers to consider. If \w\ + 1 < d, then the cur- 
rent letter £ is simply stored in the current configuration, and we move to (the 
encoding of) ((q,s),w£). Eventually, d letters are available and all transitions 
outgoing from q can be evaluated. The original guards involve comparing letter 
components with register values. Now some of the letter components are them- 
selves in registers, and TOPL automata do not have guards that compare regis- 
ters. But, we can ensure that the stores are injective (as in the translation from 
TOPL automata to register automata) and then the information about which 
registers equal which registers is statically available. Actions simply translate 
to substitutions into latter guards on the same transition. 

Lemma 4 (rTOPL to TOPL). There exists an algorithm that, given a rollback 
TOPL automaton A p over the alphabet V p n with m p registers, \Q P \ states, and 
\5 P \ transitions of length < d, builds a simple TOPL automaton A with m = 
m p +dn registers, 0{d(d+l){m+l) rn \Q p \) states, and 0(d(rf+l)(m+l)( m +") \8 p \) 
transitions such that 

w G C P (A P ) w £± . . .£± G C(A) 

d + 1 times 

where £± is an extra dummy letter. 

Proof. The set of states of A is 

Q = Q P x {0, . . . , d} x {0, . . . , d - 1} x ([to] ^ [to]) 

When the rollback configuration 

(q p ,(vi,...,v mp )),e Q ...£ h -i 

of A p is reached, one of the simple configurations 

(q p ,h,k,r), {m,...,u m ) 

of A is reached. Here h is the length of the remembered history, while k, r, m, 
. . . , u m vary subject to the following constraining invariants: 
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• the stores of A are injective: Ui ^ Uj if i ^ j 

• for i e [m p ], the value of register i of A p is stored in register r(i) of .A; 
that is, u r (j\ = Vi 

• for < k! < h and ly — (v' l7 . . . , w^) and 1 < i' < n, the i'th component v' v 
of letter ly is stored in register r(i(k + k',i')) of A, where u(k" — 
mp + (k" mod d)n + i'; that is, 

W r ( t (fe+fe',i')) = v' v 

• r is undefined for register slots that are reserved for storing letters but are 
currently unused; namely, r(i(k + k' , i 1 )) is undefined for all i' and k' such 
that 1 < i' < n and h < k 1 < d 

We consider in turn each state q = (q p , h, k, r). Its outgoing transitions are 
determined by the outgoing transitions of q p . 

Case h < d. In this case all outgoing transitions of q simply record the 
current letter I = (v\, . . . , v n ). To maintain the injectivity of stores, only those 
components of I that are not already in some register must be stored. We 
consider (to + 1)™ distinct situations: each of the n components may be in one 
the to registers, or it may be fresh. Such a situation is described by a function 
p £ [n] — [to]. We add to A a transition 

(q p ,h,k,r), (g p ,a p ), (q p , h + 1, k, r p ) 

The guard g p is constructed such that it ensures we are indeed in a situation 
described by p; the action a p stores the fresh values of £ somewhere outside of 
r([m]): the function r p records where the fresh values were stored and where the 
existing values already were. 

The guard g p is constructed as follows. If p(j) is undefined, which means 
that Vj should be fresh, then g p contains conjuncts (neqi ^ j) for i e [m]. If 
p(j) is defined, which means that u p ^ — Vj, then g p contains the conjunct 
(eqp(j) = j). These are all the conjuncts of g p . 

We now fix some injection a from the set {vj \ p{j) undefined} of fresh values 
to the set [m] —r([m]) of unused registers. The action a p contains an assignment 
(set a(vj) := j) for each j where p(j) is undefined. Also 

p(i') if fc' = k + h and p(i') defined 

r p (i(k' = < a(vi') if k' = k + h and p(i') undefined 

r(i(k',i')) otherwise 

Case h = d. At this point the values in the to registers of A are enough to 
decide whether to simulate a rollback or a skip transition of A p . (Note that we 
could use the current letter and decide one step earlier, when h = d — 1, which 
transitions to take. But, that alternative is slightly more complex.) First we 
encode the transitions outgoing from q pi 

(<7 P , [(50, ao), • • • , (gd'-i,a d '-i)],q') with d' < d 
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and later we will encode the skip configuration transition. 

Consider an assignment (seti := j) that appears in . Suppose that the 
distribution of values just before this assignment is given by r'. This means 
that the register i of A p is simulated by r'(i), and that the jth component of 
letter £k> is simulated by r'(i(k + k',j)). After the assignment is executed, the 
distribution of values is given by 



Thus, it is possible to find the register distribution function after each of the 
d! steps. Let us write rv for the distribution function just before step kl ; in 
particular, ro = r, where r is given by the state q of A. Suppose now that gw 
contains the conjunct (eq i = j). We can evaluate this conjunct statically by 
checking whether rk' (i) = rw + k' ', j)). Similarly, we can evaluate all guards 
go, gi, ■ ■ ■ , gd'-i- If some guard does not hold, then the encoding is nothing. 
If all guards hold, then we add outgoing transitions. These transitions must 
also save the current letter, so the whole process from Case h < d is repeated. 
However, that process is applied as if the starting state has the value distribution 
function r^, rather than r; as if the history is h — d! ', rather than h; and as if 
the reference point is (k + d!) mod d, rather than k. 

If the state q = (q p , d, k, r) still has no outgoing transitions at this point, it 
is because none of the outgoing transitions of q p have guards that all hold. In 
this case, we must simulate a skip transition. This is done as in Case h < d, 
except we first simulate removing the letter to which k points. We do so by 
incrementing k and decrementing h and replacing the distribution function r by 
one that is undefined at i(k,i') for %' G [n]. 

Finally, we add one final state, and transitions to it from the states with 
h = d. These transitions have a guard that requires all d saved letters to 



Remark 2. It follows from Lemma 3 and Lemma 4 that rollback TOPL au- 
tomata and simple TOPL automata are equally expressive. 

4.3 Properties 

The following are consequences of Theorem 1 and Lemmas 1, 2, 3, and 4. 

Theorem 2. Register automata, simple TOPL automata, and rollback TOPL 
automata share the following properties: 

1. The emptiness and the membership problems are decidable. 

2. The language inclusion, the language equivalence, and the universality 
problems are decidable for 1 register. 

3. The language inclusion, the language equivalence, and the universality 
problems are undecidable in general. 




be £j_. 



□ 
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4- The languages of these automata are closed under union, intersection, con- 
catenation, and Kleene star. 

5. The languages of these automata are not closed under complementation. 

5 TOPL Properties 

In this section, we describe the user level language of TOPL properties, as well as 
the translation from TOPL properties to rollback TOPL automata in Figure 2. 
This language provides a user-friendly way to write down certain rollback TOPL 
automata. As such, TOPL properties form a subset of rollback TOPL automata, 
and the translation in Figure 2 is morally an injection. There is, however, in 
the interest of user-friendliness, some encoding happening. 

Syntactically, a TOPL property has a set of transitions. 3 Each transition 
is a labelled arc (directed edge) with a source and a target vertex identified by 
their name. There must be a vertex named start and one named error. Labels 
look like method calls. 

In the following, we will describe TOPL labels and how they translate to 
guards and actions. But first we will describe the events that a TOPL prop- 
erty considers and hence the alphabet that the corresponding rollback TOPL 
automaton works over. 

5.1 The Alphabet 

We model the semantics of programs and properties as sets of event traces. We 
say that a program violates a property when their sets of traces intersect. In 
other words, properties encode bad executions, rather than good executions. 

The traces of events that a property P deems lead to a bad execution is 
thus the language of the corresponding automaton Ap. Observable events for 
TOPL properties are method calls with parameter values and a time-tag. The 
method name is taken to be a string literal and an arity (for simplicity we do not 
distinguish overloaded methods with the same arity) . The time-tag is either call 
or ret, corresponding to observing the method being called or returned from. If 
the time-tag is ret, the event carries the return value rather than any parameter 
values. 

Thus, for Ap, the values of interest come from two sources; event ids and 
parameter values: The set E of event ids is given by the grammar 

call m k | return m k 

where m k is one of the method names mentioned in P (with arity k). 

The set Vl of possible parameter values is the set of values from the pro- 
gramming language (e.g., Java) plus a dummy value _L. 

3 For the complete BNF grammar for TOPL we refer to [18]. 
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The values are then V = E + Vl, and the alphabet is V n+2 , where n is the 
maximal arity of a method mentioned in P. (We need one extra register for the 
return value and one for the event id.) 

We are only concerned with the subset E x V£ +1 , ie the tuples where exactly 
the first component is an event id. If P has a maximal arity of 5, the event 
call m 3 (a, 6, c) would be understood as (call m 3 , _L, a, 6, c, _L, _L) by Ap. Here 
the first component is the event id, the second is a filler for the return value, 
the next three are the parameter values and the rest are fillers which allows all 
tuples to have the same lenght. The event ret r — m 3 would be understood as 
(ret m 3 , r, _L, _L, _L, _L, _L) by Ap. 

We can now describe the labels of P and explain how they are transformed 
into guards and actions of Ap. 

5.2 Guards and Actions from Labels 

Labels refer to the registers via patterns. To retain the flavour of a programming 
language, we call the registers of P property variables. For each property variable 
v there are three associated patterns. The uppercase pattern V matches any 
value and writes it in the property variable v. The lowercase pattern v reads 
the value of the property variable v and only matches that value. The negated 
lowercase pattern ! v reads the value of the property variable v and only matches 
different values. 

Further, every element of V acts as a pattern that matches only the value it 
denotes, and a wildcard * pattern matches any value. 

A TOPL property is well-formed when it satisfies the following two condi- 
tions: (i) labels must contain uppercase value patterns at most once; (ii) any 
use of a lowercase patterns must be preceded by a use of the corresponding 
uppercase pattern on all paths from start (i.e., property variables must be 
written before being read). From now on we assume TOPL properties to be 
well-formed. 

A label is of one of three forms: 

I ::= call m k (xi, . . . , Xk) \ ret x := m k \ x :— m k (xi, . . . , Xk) 

where x and x\,. . . ,Xk are all patterns. The first two forms describe observable 
events and will be translated into unit transitions. We describe that translation 
first and treat the last form at the end. 

In order to match elements from V, there is an extra register in Ap for 
each unique element mentioned by P (note that this includes all the method 
names). This register contains that value in the initial state of Ap and is never 
overwritten. 

For each pattern x which is not *, we define three functions: reg(x) : N gives 
the associated register, pred(x) : N — > G gives the corresponding predicate and, 
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act(x) : N — >• A gives the action, i.e. 

a; pred(x)(j) act(x)(j) 

V true set reg(v) := j 

v eq reg(v) = j nop 

!v neq reg(v) = j nop 

a 6 V eq reg(a) = j nop 

* true nop 

We can now interpret labels. For a label I, we define 

PI = Wh, Ma)} 

where [— ]g is given by 

j Hg 

call m k (x\, . . . ,Xk) pred{m k ){\) 

Apred(x 1 )(3) A . . . Apred(x k ){k + 2) 
ret x :— m k pred(m k )(l) A eq reg(x) = 2 

and [—1a is given by 

I Ma 

call m k (xi, . . . , Xfe) act(xi)(3); . . . ; act(xk)(k + 2) 
ret x m k act{x)(2) 

If P is well-formed all the writes will be to different registers and thus commute. 
Let us now consider the third label 

x := m k (x 1 , ...,x k ) 

Notice how the right hand side refers to call time, while the left hand side refers 
to return time. We decided that this label should mean that m k was called with 
parameters matching xi,...,Xk and returned a value matching x, and no event 
was observed in the meantime. This is because an intermediate call, for instance 
a recursive call, could disconnect the method call and the return value. This 
label makes use of the rollback feature and translates to a transition of depth 
two: 

\x := m k (xi, . . . ,x k )j = [{call m k (xi, . . . ,x k )j, {ret x := m k j] 

Initial State. All that remains is to describe the initial state of Ap. As 
mentioned, the registers which corresponds to values in V contain those values in 
the initial state. Note that the generated actions never overwrite these registers. 
Registers which correspond to property variables contain _L. If P is well-formed, 
_L will never be read from those registers. 
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Directives. To ease the task of writing TOPL properties and to increase their 
usefulness, two directives were added to the language. A prefix directive for 
convenience when specifying method names and an observe directive for pre- 
filtering events. The prefix directive prefix p produces from every method 
name pattern m k , an extra method pattern pm k . It further produces, from any 
transition involving m k , a similar transition involving pm k . The prefix directive 
observe p has the effect that only method calls which matches p are considered 
part of the event trace. 



6 Examples 

In this section, we formalize some temporal properties in TOPL. We refer the 
reader to [18] for a more comprehensive exemplification of properties. 

Iterators. Property (1) in Section 2 can be formalized as follows: 

property InvalidateOther Iterators 

observe < java.util . {Collection, Iterator} . *> 

prefix <java.util . {Collection, Iterator}> 

start -> start: * 

start -> one: X := C. iterator () 

one -> one: * 

one -> two: Y := c.iteratorO 

two -> yBad: x.removeO 

two -> xBad: y. remove () 

yBad -> error: call y.*[*] 

xBad -> error: call x-*[*] 

This property describes the interaction of three objects. The first line is an 
observe directive declaring that we are interested in all method calls (denoted 
by the wild card *) on instances of java.util . Collection and java.util . Iterator 
and all those that override them. Figure 3 shows a graphical representation of 
the property. 



Heap Shape and Values Sensitive Properties The following is a TOPL 
encoding of the property (4) in Section 2. It reports an error if a list is cyclic 
or it has the pan- handle shape (i.e., there is a lasso at some point). Directives 
are omitted, 
property ListNotCyclic 



start -> start 


* 




start -> a: 


X 


= * 


getList () 


a -> a: 


X 


= X 


next () 


a -> b: 


Y 


= X 


next () 


b -> b: 


Y 


= y 


next () 


b -> error: 


x 


= y 


next () 


a -> error: 


X 


= X 


next () 



The idea is that this property will bind the property variable x with any possible 
object in the list, and the y with any possible successors (via the next field) of 
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start vC_) * 

X := C.iterator() 
one * 

Y := c.iteratorQ 




yBad 




xBad 



Figure 3: Property InvalidateOtherlterators 



the current binding of x. Therefore, if there is a lasso, in the list, this will be 
detected when a new binding of y via a b -> b becomes equal to the binding of 
x. The transition a ->error detects the case where there is an object pointing 
to itself. This property shows the importance of re-binding for specification 
variables. This phenomenon is also shown by the following property which 
detects when a dictionary overwrite one of its bindings. 

property BadDictionary 

message "dictionary overwrites its bindings" 

observe <Dictionary . *> 

prefix <Dictionary> 

start -> start: * 

start -> written: D.put(K, V) 

written -> written: d.put(k, V) 

written -> error: !v := d.get(k) 

The overwrite is detected by the guard which checks if the value associated with 
a key k is the same as the original binding recorded in the property variable v. 

Iterating Ropes Data Structures. The property (2) from Section 2 can be 
encoded in TOPL as follows: 
property IterateRopes 

observe <Itr . next , Concat . make , Str . itr> 

prefix <Itr , Concat ,Str> 

start -> start: * 

start -> a: I := S.itrQ 

a -> a: *, S := make(s, *) , S := make(*, s) 

a -> b: J := s.itrO 

b -> c: i.setO) 

c -> error: j.nextO 

b -> d: j .set(*) 

d -> error: i_.next() 

The re-binding in the self-loop a -> a combined with a->a: * ensure that 
iterators of sub-collections in the rope are tracked. Therefore, of two different 
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iterators of two collections sharing some object invalidating 'h the error state, 
ddare there other permutation missed by the property? 

Property for Taint Info. To check taint information as described by prop- 
erty (3) in Section 2 we can use the following TOPL specification: 

property TaintCheck 

observe <Itr . next , Concat . make , Str . itr> 

prefix <Itr , Concat ,Str> 

start -> start: * 

start -> a: X := input () 

a -> a: X := make(x, *) , X := make(*, x) , (!sani)(*) 

a -> b: sani(x) 

a -> error: sink(x) 

The specification takes care to track any tainted substring x used in the rope 
and report an error if x reaches the sink method. 

Singleton Pattern. The last property we present concerns the singleton pat- 
tern where the instantiation of a class is restricted to one object. It is used 
when exactly one object is needed to coordinate actions across the system. It 
can be enforced by the following. 

property SingletonPattern 

start -> a: I := getlnstance () 
a -> error: '-i- = getlnstance () 

The property produce an error is the object obtained by the method getlnstance () 
is not the same as the one already bound to the automata variable i. 

7 Implementation 

We have implemented the formal framework in a tool 4 for checking whether 
a Java program violates TOPL properties. The tool consists of two parts: a 
compiler and a checker. The TOPL compiler (topic) instruments Java bytecode; 
The TOPL checker (class topi . Checker) monitors the execution and reports 
violations. Figure 4 depicts how the compiler and the checker fit together, thus 
summarizing our runtime checking system. 

The TOPL Checker. The checker logs the property violations it detects. 
Internally it maintains a set of active states. New states are produced by apply- 
ing an action to an existing state (called the parent). For error reporting, each 
state keeps track of its parent. The implementation takes advantage of this fact 
by representing stores using persistent sets for bindings; more precisely, we use 
treaps [3]. 

As states point to their parent, we need some mean to control overhead. This 
is done by a configurable parameter for the history length h in Property . j ava. 

4 http: //rgrig.github . com/topl 



23 



(.class files) (TQPL properties) 



TOPL Compiler 



TOPL Checker , 

I 

Checker, iava 1 



Bytecodc 


(Automaton^ 


i /- 
i ] 


Jnstrumenter J 


t Generator 1 


r 



(configurations) 



automaton 



. class files 
(instrumented) 



(JVM) 

Figure 4: Architecture of the TOPL tool 
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We should then allow the Java VM to garbage collect states that are not reach- 
able in h steps from some active state. To do that, after each execution of a 
transition we could do a BFS traversal with h steps and set to null the parent 
pointer of fringe states. However, for n active states, the traversal would take 
0(nh) time. The procedure for executing transitions takes 0(nd) time, where d 
is the maximum out-degree of the property. So, the checker would become much 
slower if h > d, which usually is the case. One solution could be to perform 
the BFS once in every h/d steps. The solution we implemented tries to better 
address the average case. We remember the number m of states seen by a BFS, 
and perform the next BFS after ~ m new states were created. 

The set of active states in nondeterministic automata can be unbounded. 
Therefore, as for the case of trace lengths, we impose a bound on the number of 
active states. The bound can be configured by the user in the Property, java 
file. The strategy for forgetting states can also be chosen among: random, 
newest, and oldest. Newest/oldest refer to a time attached to states. The time 
of a state is the number of transitions taken to reach that state. 

The TOPL Compiler. The compiler takes as input a Java project and a 
set of TOPL properties. The Java project is defined to be all files in a given 
directory, recursively descending into jar and zip files as well as subdirectories. 
The compiler produces (1) a copy of the Java project with the class files in- 
strumented, and (2) a TOPL automaton. Only the Java methods relevant to 
the given TOPL properties are instrumented, but taking into consideration the 
inheritance tree of project. For example, suppose that a TOPL property men- 
tions the method next from the interface Iterator. Then all the methods that 
implement Iterator .next are instrumented and matched by the automaton. 
More in details the compiler execute five phases. First, TOPL properties are 
converted into a simpler, intermediate format. Second, the compiler statici- 
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cally checks that the property is well-formed (e.g., variables writes must precede 
reads). We plan to also statically check whether the automaton can never fail. 
Third, all properties are merged into one automaton that is dumped into the 
file Property . text. Fourth, the compiler generates the checker configurations 
in the file Property . j ava. This file contains info used by the TOPL property 
and code configuring the checker. Configuration parameters include: 

• History length - how many events to remember. A high value reports 
more events that led to error. A low value reduces the space and time 
overhead. 

• Maximum number of active states. A high value increases the chances of 
detecting violations at the cost of higher overhead. 

• Collect call stacks is a flag for including/excluding call stacks in violation 
reports. 

• Approximantion method gives different ways in which the history informa- 
tions are forgotten when the max amount of info allowed is reached. 

Fifth, the compiler instruments methods' bytecode. 5 

8 Experimental Results 

We measure the overhead on the test suite DaCapo [10], version 9.12. DaCapo 
is a collection of automated tests that exercise large portions of code from open- 
source projects and from the Java standard libraries. DaCapo itself was used 
for many experiments by the research community. Hence, we do not expect to 
find any bugs, but aim to measure the overhead. 

We checked two types of properties with TOPL. First, properties that ex- 
press correct usage of the standard Java libraries. Second, properties that ex- 
press temporal constraints which we extracted from the code comments of three 
open-source projects, all of which are included in DaCapo. The latter properties 
are more complex and interesting. 

We measured both time and space overhead. It turns out that space overhead 
is negligible, below the variance caused by the randomness of garbage collection. 
So, we only report time overhead, in Tables 1 and 2. All experiments are 
performed on an Intel i5 machine with 4 cores at 3.33 GHz with 4 GiB of memory 
and running Linux 2.6.32 and Java VM 1.6.0_20. 

8.1 JDK Properties 

These properties are taken from the JavaMOP project [22]. HasNext was briefly 
mentioned in Section 4. It checks that no iterator is advanced without first 
calling hasNext and have it return true. Unsafelterator checks that no iterator 
is advanced after the iterated collection has been modified. UnsafeMapIterator 

5 To instrument Java bytecode we use a fork of the library Barista [13]. 
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UnsafeFile Writer 
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31.0 
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360.9 
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1.2 



Table 1: Experiment on small properties run on the DaCapo benchmarks (run 
in convergence mode). Times are in seconds. 

checks that no iterator on the keys or values of a map is advanced after the map 
has been updated. UnsafeFileWriter checks that no file is written to after it 
has been closed. 

Table 1 presents the running times for DaCapo's projects with and without 
instrumentation of Java's standard library. For each project there are two in- 
strumented runs, one with a maximum of 3 active configurations and one with a 
maximum of 10 active configurations. For properties involving only one object, 
such as HasNext, a limit of 10 means that at most that many objects are tracked 
at the same time. 

8.2 DaCapo Properties 

We focus on 3 projects that are still widely developed and used: Tomcat, H2, 
and PMD. 

All three benchmarks are large open-source projects with many users and 
active developers. The versions were current in 2009 when DaCapo-9.12 was 
released. 

Tomcat 6.0.20. Tomcat is a highly concurrent servlet server. Servlets are 
Java programs running in a webserver, extracting data from ServletRequests , 
and sending data to ServletResponses. A response has two associated incom- 
ing channels: a stream and a writer. Both should not be used concurrently. A 
response could be forwarded once the servlet sent data to it. 

But the servlet, before forwarding the response, must call flush on the 
stream, on ther writer, or on the response itself. This is one of two properties 



26 



Number of tracked active states 



reference < < 10 1 < 10 2 

tomcat 5.3±0.1 5.4±0.1 5.6±0.2 9.0±0.3 
pmd 5.2±0.4 5.4±0.2 12.2±0.3 47.7±10.7 
h2 6.6±0.2 9.5±0.2 130.1±12.2 timeout 

Table 2: Experimental Results. Times are in seconds, averaged over 10 runs. 



we checked. The other is given an anecdotal description at the end of the section. 

H2 1.2.121. H2 is a database server. We checked four properties for it. Three 
properties express constraints on the order in which the methods of an interface 
may be called. For example, a client should not attempt to ask for a row from 
a cursor that did not advance; that is, "/or the interface Cursor, the method 
next must be called before calling method getSearchRow". The fourth property 
describes the contract of the class SysConstants. This class reads environment 
variables when it is loaded. Its comments say that "it is legal for a program to set 
these environment variables using the method System. setProperty, provided 
that this is done before the Java VM loads the class SysConstants". It is easy 
to specify this property because TOPL intentionally treats the special methods 
<init> and <clinit> as if they are normal methods. 

PMD 4.2.5. PMD looks for bugs, dead code, and a few other problems in 
Java code. We checked five properties for it. One of the five involves two objects: 
"A scope should be asked what is the definition of a name only if it already replied 
that it knows the name". 

Table 2 summarizes the experimental results. The reference column shows 
the running time without any instrumentation. A timeout signifies a time 
greater than the reference time xlOO. All reported times are averaged over 
10 runs, and are accompanied by the unbiased estimate of standard deviation. 
In theory the set of active states is unbounded, but we impose a size limit on 
it. The table reports the running times for the instrumented bytecode when 
the size limit is 0, 10 1 , 10 2 . For h2 the running time is more than xl.4 the 
reference time even when the TOPL checker is not tracking any active state. 
This indicates that we instrumented a simple method that runs often. 

A Bug's Death. The interface ServletResponse from Tomcat contains meth- 
ods getWriter and getOutputStream. The documentation of getWriter states 
that "either this method or getOutputStream may be called to write the body, 
not both". We initially interpreted this comment as follows. 

It is illegal to call both the method getWriter and the method getOutputSj^-ream 
on the same instance of ServletResponse. 

The TOPL checker reported many violations of (5). We were not too surprised 
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because we were unsure of the intended meaning of the comment. We were sur- 
prised, however, that sometimes a null dereference occurred in DaCapo itself. 
We investigated. The statement that threw a null-pointer exception was the 
following: 

if (log != null) log. write (b) ; (6) 

It must be that a concurrent thread sets log to null. The statement is located 
in the method write of the class TeeOutputStream. The only method that sets 
log to null is closeLog, from the same class. So, we wrote a TOPL property 
saying that executions of write and closeLog must not overlap in time, and 
we asked for error reports that include call stacks. Indeed, DaCapo's main loop 
sometimes calls closeLog while the Tomcat server is printing. 

TOPL helped us find a concurrency bug in the infrastructure of DaCapo. 
First, the bytecode instrumented by TOPL helped us notice the problem. The 
reason is not that a certain interleaving became more likely. Bug 2934521 of 
DaCapo mentions that exceptions of the Tomcat benchmark are silently ignored. 
This is why the null-dereference was not noticed before. The TOPL checker 
tried to report an unrelated issue to System. err. DaCapo had redirected the 
standard error stream to its own class, which faulted. TOPL caught the error 
and reported it as an internal issue, because no exceptions should be thrown 
while running the TOPL checker. Second, TOPL helped us identify the data 
race. It would have been more difficult to figure out which threads exactly used 
the stream simultaneously. 

We then tried a weaker version of property (5): 

Stream and writer of a response cannot both be used. (7) 

This property 6 involves three related objects. It turns out this property is also 
violated. After reading the code to which the error traces pointed at, we no- 
ticed that the comments of some implementing classes weaken the contract: 
They say that the stream and the response cannot be used without an interven- 
ing reset. In this case the comments on the interface ServletResponse are 
arguably wrong, rather than the code. This brings out another advantage of 
TOPL: Since TOPL properties are processed by tools, they are less likely to 
become out-of-sync with the code. 

9 Related Work 

TOPL automata are a variant of register automata [21]. We proved that several 
decidability results known for register automata also hold for TOPL automata. 
Register automata work with infinite alphabets, as do a few other types of 
automata [28]. Until now, such automata found applications mostly related 
to databases and XML. Intuitively, however, such models are a good fit for 
expressing properties of programs because the set of program values is infinite. 

6 http : //goo . gl/V008y 
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For example objects could be instantiated inside non-terminating loops. Our 
paper exhibits a working runtime verification system based on register automata, 
thus validating the intuition. 

QVM [4] is a system for runtime monitoring of applications that are de- 
ployed. As such, QVM's main goal is efficiency, and it achieves it because (f ) it 
is carefully implemented inside a Java virtual machine, (2) it checks properties 
involving a single object, (3) it adapts on-the-fly the rate of sampling objects to 
monitor, such that the overhead remains within a given budget. On the other 
hand, TOPL aims to help during development and testing. Thus, having a 
precise and expressive language for specifying temporal properties takes prece- 
dence over efficiency. For example TOPL can express properties involving many 
objects, even if monitoring such properties typically imposes a bigger overhead. 
Both QVM and TOPL let the programmer tune the overhead/coverage balance. 
In QVM's case the programmer directly chooses the percentage of time spent 
monitoring, which is intuitive; in TOPL's case the programmer chooses a more 
abstract parameter, the number of active configurations, which correlates with 
the overhead. 

JavaMOP [22] and tracematches [2] are the main runtime verification sys- 
tems that handle multiple objects. They differ in many details, but the essential 
ideas fit a common intuition. To communicate this intuition we depart from the 
terminologies of [22] and [2]. The alphabet is £ = C x V, where C is a finite 
set of constants and V is an infinite set of values. The task is to build a rec- 
ognizer E* — > B. Both approaches build upon a basic recognizer C* — > B over 
finite alphabets. Tracematches specify this recognizer by regular expressions, 
while JavaMOP supports several other logics. Both approaches make further 
assumptions on the structure of V, and then define a way to specify a family 
P C £ — >• B of predicates over letters. Each such predicate p G P determines 
a slice t v e C* of the trace r e S* by retaining exactly those constants from 
letters satisfying p. The trace r is recognized when any of its slices satisfy the 
basic recognizer. Within such a framework, it seems difficult to express some 
of our examples, which naturally use an unbounded number of assignments to 
registers. 

Any algorithm that analyzes or transforms JavaMOP properties must be 
specific to a certain plugin because the formalization of the framework [26] 
imposes no constraint on recognizers C* — > B. In contrast, any algorithmic 
result discovered for register automata implies one for TOPL properties. For 
example we know that the emptiness problem is decidable, and we know that it 
is impossible to compute the negation of a property. 

Tracematches [2] are likely to have the same expressivity as JavaMOP with 
the plugin for regular expressions. But, we are unaware of a fully formal com- 
parison. Tracematches evolved from the work on aspect-oriented programming. 
Unsurprisingly, the user may specify arbitrary code to run when a match is 
found. The formal semantics ignore this arbitrary code and how it could influ- 
ence further matches. 

ConSpec [1] is a language used to describe security policies. Because Con- 
Spec automatons are deterministic and have only a countable number of states, 
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they cannot in principle express the property InvalidateOtherlterators (see 
Figure 3). 

From the techniques used mostly for static verification of object-oriented 
programs, typestates [29] are probably most similar to TOPL. A modular static 
verification method for typestate protocols is introduced in [9]. The specifica- 
tion method is based on linear logic and relations among objects in the protocol 
are monitored by a tailored system of permissions. The specification of the 
interactions among objects by means of permissions adds an extra level of ma- 
chinery which increases the gap between the intuitive protocol description and 
its formalization. Similarly [14, 8] provide a mean to specify typestate prop- 
erties that belong to a single object. The specified properties are reminiscent 
of contracts or pre/post-conditions for methods and can deal with inheritance. 
In [17] the authors present sound verification techniques for typestate properties 
of Java programs. Their approach is divided in several stages with different ver- 
ifiers varying for cost and precision. In the early stages efficient but imprecise 
analyses are employed whereas more expensive and precise techniques are then 
progressively employed in later stages. Every stage focuses on verifying only the 
parts of the code that previous stages failed to verify. It is likely that TOPL 
could be fruitfully combined with their analysis technique. 

A specification language for interface checking aimed at C programs (called 
SLIC) is introduced in [6]. Differences between SLIC and TOPL include: the 
use (in SLIC) of non-determinism to encode universal quantification of dynam- 
ically allocated data, and the ability to have complex code in the automaton 
transitions. TOPL specifications naturally express universally quantified prop- 
erties over data structures and for computability reasons, we have chosen to 
limit the actions performed during automaton transitions. Simple SLIC specifi- 
cations are verified by the SLAM verifier [5] . While SLAM specialises on device 
drivers and checks client conformance rather than full protocols, very general 
specifications of object-oriented program behaviour can be given in JML [20] 
and Spec" [7[. However the latter two languages focus on class specifications 
and do not have temporal features. 

Finally, there are similar investigations going on in the functional program- 
ming community. In [15] contracts are used to express legal traces of programs 
in a functional language with references. The contracts specify traces as regular 
expressions over calls and returns, and hence, they resemble our automata, in 
a quite different setting. Here, the specifications are function-centered, though, 
and again, capturing inter-object relations seems somewhat tricky. 

10 Conclusions and Future Work 

In this paper we have introduced TOPL a new runtime verification system 
aimed at checking temporal safety properties of object-oriented programs. For 
TOPL we have defined solid formal foundations based on register automata. 
Moreover we have studied the formal correspondence between TOPL automata 
and register automata proving that they are equally expressive. TOPL has been 
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implemented in a practical tool that we have applied to a variety of programs 
including large open source Java projects. 

In the future, we intend to combine TOPL with separation logic [25] in order 
to deal with the heavy use of the heap and aliasing in object-oriented software. 
Moreover, we aim at developing static analysis techniques for TOPL properties 
using the jStar framework [16]. This will require investigating suitable abstrac- 
tion for obtaining meaningful and precise over-approximations of the state space 
of the programs. Finally, we intend to develop a tailored bi-abduction inference 
technology [12] which would help with scalability of the analysis. 
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